Probabilistic Record Linkage for Genealogical Research
نویسندگان
چکیده
The most slow and tedious job in genealogical research is searching civil or church records for information about an individual. But, this is an essential step in research. By searching multiple sources such as census records, wills, deeds, birth and death records we can compile a more complete set of information, and potentially the pedigree of an individual. When records are stored electronically modern methods of probabilistic record linkage can combine or link all the information on an individual from various sources in seconds, rather than requiring days or weeks of arduous searching by a genealogist. Researchers in England, Canada and the U.S. Census Bureau developed the theory for probabilistic record linkage to aid in constructing pedigrees of individuals from vital records, in order to track hereditary diseases. However, probabilistic record linkage has yet to be widely applied to most sources of information used for common genealogical research. This paper is the summary of the results from two Master’s Projects in the Department of Statistics at Brigham Young University. Here, we describe the approach to probabilistic record linkage used by the Family History Department of The Church of Jesus Christ of Latter-day Saints in TempleReady, and demonstrate its application to genealogical research using a set of civil and church records of Quakers in Perquimans and Pasquotank Counties, North Carolina. The results of our study are very promising. Probabilistic record linkage has the potential of dramatically increasing the productivity of genealogical researchers. Although complete automation of genealogical research is a way off, probabilistic record linkage could revolutionize the way research is done. This paper is a report of a work in progress; describing what has been done to the present, and outlining some of the many tasks yet to be addressed.
منابع مشابه
Genealogical Record Linkage: Features for Automated Person Matching
This paper provides a high-level overview of how automatic person matching (genealogical record linkage) algorithms can be developed, and then provides a detailed explanation of many of the features used by FamilySearch in doing person matching. Empirical results show a dramatic improvement in accuracy by using these features trained with neural networks, when compared to traditional probabilis...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملPROBABILISTIC METHODOLOGY FOR RECORD LINKAGE DETERMINING ROBUSTNESS OF WEIGHTS By:
Over time, the world population has developed a desire to research their ancestoral linage. Many resources have been identified to aid an individual in genealogical research. In the United States, one of the greatest resources for researching genealogy is census records. Census records allow a genealogical researcher to track individuals over time, broadening the scope of information one can ac...
متن کاملReconstructing historical populations from genealogical data: an overview of methods used for aggregating data from GEDCOM files
The GEDCOM file format is by far the most widely used means of exchanging genealogical data and extensive collections of these files are available online. There is a huge potential benefit for historians and other academics who are able to make use of the data contained in available GEDCOM files, as these effectively represent hundreds of thousands of hours of crowdsourced work and a considerab...
متن کاملUtilizing Stacking for Feature Reduction in Graph-Based Genealogical Record Linkage
Genealogy research is centered on collecting records about an individual from various sources and combining the information to gain a larger historical perspective about that individual, commonly in the form of a pedigree. Data extraction, the internet, and other technological advancements have made large amounts of digital genealogical data more accessible. Discovering the relevancy of a digit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014